Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
نویسندگان
چکیده
Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملCity-Region as the Extension of Urban Fabric
Introduction City-region qua a system of concepts has a theoretical, empirical and epistemological significance. On the strength of such a capacity, this system is applied to better understand the structures of production of space in different scales (global, international, national, regional and urban). Materials and Methods Using Henri Lefebvre’s conception of Urban Fabric, this paper tr...
متن کاملEvaluation of Updating Methods in Building Blocks Dataset
With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...
متن کاملAn Empirical Assessment of Semantic Interpretation
We introduce a framework for semantic interpretation in which dependency structures are mapped to conceptual representations based on a parsimonious set of interpretation schemata. Our focus is on the empirical evaluation of this approach to semantic interpretation, i.e., its quality in terms of recall and precision. Measurements are taken with respect to two real-world domains, viz. informatio...
متن کاملارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012